We present an extension to masked autoencoders (MAE) which improves on the representations learnt by the model by explicitly encouraging the learning of higher scene-level features. We do this by: (i) the introduction of a perceptual similarity term between generated and real images (ii) incorporating several techniques from the adversarial training literature including multi-scale training and adaptive discriminator augmentation. The combination of these results in not only better pixel reconstruction but also representations which appear to capture better higher-level details within images. More consequentially, we show how our method, Perceptual MAE, leads to better performance when used for downstream tasks outperforming previous methods. We achieve 78.1% top-1 accuracy linear probing on ImageNet-1K and up to 88.1% when fine-tuning, with similar results for other downstream tasks, all without use of additional pre-trained models or data.
translated by 谷歌翻译
Knowledge representation and reasoning in law are essential to facilitate the automation of legal analysis and decision-making tasks. In this paper, we propose a new approach based on legal science, specifically legal taxonomy, for representing and reasoning with legal documents. Our approach interprets the regulations in legal documents as binary trees, which facilitates legal reasoning systems to make decisions and resolve logical contradictions. The advantages of this approach are twofold. First, legal reasoning can be performed on the basis of the binary tree representation of the regulations. Second, the binary tree representation of the regulations is more understandable than the existing sentence-based representations. We provide an example of how our approach can be used to interpret the regulations in a legal document.
translated by 谷歌翻译
Realistic synthetic image data rendered from 3D models can be used to augment image sets and train image classification semantic segmentation models. In this work, we explore how high quality physically-based rendering and domain randomization can efficiently create a large synthetic dataset based on production 3D CAD models of a real vehicle. We use this dataset to quantify the effectiveness of synthetic augmentation using U-net and Double-U-net models. We found that, for this domain, synthetic images were an effective technique for augmenting limited sets of real training data. We observed that models trained on purely synthetic images had a very low mean prediction IoU on real validation images. We also observed that adding even very small amounts of real images to a synthetic dataset greatly improved accuracy, and that models trained on datasets augmented with synthetic images were more accurate than those trained on real images alone. Finally, we found that in use cases that benefit from incremental training or model specialization, pretraining a base model on synthetic images provided a sizeable reduction in the training cost of transfer learning, allowing up to 90\% of the model training to be front-loaded.
translated by 谷歌翻译
Node classification on graph data is a major problem, and various graph neural networks (GNNs) have been proposed. Variants of GNNs such as H2GCN and CPF outperform graph convolutional networks (GCNs) by improving on the weaknesses of the traditional GNN. However, there are some graph data which these GNN variants fail to perform well than other GNNs in the node classification task. This is because H2GCN has a feature thinning on graph data with high average degree, and CPF gives rise to a problem about label-propagation suitability. Accordingly, we propose a hierarchical model selection framework (HMSF) that selects an appropriate GNN model by analyzing the indicators of each graph data. In the experiment, we show that the model selected by our HMSF achieves high performance on node classification for various types of graph data.
translated by 谷歌翻译
Privacy noise may negate the benefits of using adaptive optimizers in differentially private model training. Prior works typically address this issue by using auxiliary information (e.g., public data) to boost the effectiveness of adaptive optimization. In this work, we explore techniques to estimate and efficiently adapt to gradient geometry in private adaptive optimization without auxiliary data. Motivated by the observation that adaptive methods can tolerate stale preconditioners, we propose differentially private adaptive training with delayed preconditioners (DP^2), a simple method that constructs delayed but less noisy preconditioners to better realize the benefits of adaptivity. Theoretically, we provide convergence guarantees for our method for both convex and non-convex problems, and analyze trade-offs between delay and privacy noise reduction. Empirically, we explore DP^2 across several real-world datasets, demonstrating that it can improve convergence speed by as much as 4x relative to non-adaptive baselines and match the performance of state-of-the-art optimization methods that require auxiliary data.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Robotic force-based compliance control is a preferred approach to achieve high-precision assembly tasks. When the geometric features of assembly objects are asymmetric or irregular, reinforcement learning (RL) agents are gradually incorporated into the compliance controller to adapt to complex force-pose mapping which is hard to model analytically. Since force-pose mapping is strongly dependent on geometric features, a compliance controller is only optimal for current geometric features. To reduce the learning cost of assembly objects with different geometric features, this paper is devoted to answering how to reconfigure existing controllers for new assembly objects with different geometric features. In this paper, model-based parameters are first reconfigured based on the proposed Equivalent Theory of Compliance Law (ETCL). Then the RL agent is transferred based on the proposed Weighted Dimensional Policy Distillation (WDPD) method. The experiment results demonstrate that the control reconfiguration method costs less time and achieves better control performance, which confirms the validity of proposed methods.
translated by 谷歌翻译
Vascular shunt insertion is a fundamental surgical procedure used to temporarily restore blood flow to tissues. It is often performed in the field after major trauma. We formulate a problem of automated vascular shunt insertion and propose a pipeline to perform Automated Vascular Shunt Insertion (AVSI) using a da Vinci Research Kit. The pipeline uses a learned visual model to estimate the locus of the vessel rim, plans a grasp on the rim, and moves to grasp at that point. The first robot gripper then pulls the rim to stretch open the vessel with a dilation motion. The second robot gripper then proceeds to insert a shunt into the vessel phantom (a model of the blood vessel) with a chamfer tilt followed by a screw motion. Results suggest that AVSI achieves a high success rate even with tight tolerances and varying vessel orientations up to 30{\deg}. Supplementary material, dataset, videos, and visualizations can be found at https://sites.google.com/berkeley.edu/autolab-avsi.
translated by 谷歌翻译
电缆在房屋,医院和工业仓库中很普遍,容易纠结。本文通过引入新颖的不确定性定量指标和与电缆相互作用以减少感知不确定性相互作用的新型不确定性定量指标和动作,扩展了对自动释放长电缆的先前工作。我们为Tangle操纵2.0(SGTM 2.0)提供了滑动和握力,该系统使用双边机器人自动解开大约3米长的电缆,并使用每个步骤的不确定性估算值估计,以告知动作。通过互动降低不确定性,缠结操作2.0(SGTM 2.0)的滑动和握住可以减少其必须采用的状态排列动作的数量,从而大大加快运行时间。实验表明,SGTM 2.0可以在1或2台上和图8节的电缆上取得83%的脱节成功,并且在这些配置中的70%终止检测成功,在无障碍精度上优于SGTM 1.0,超过43%,在全部推出速度上超过200% 。可以在sites.google.com/view/sgtm2上找到补充材料,可视化和视频。
translated by 谷歌翻译
人类广泛利用视觉和触摸作为互补的感官,视觉提供有关场景的全球信息,并在操纵过程中触摸当地信息而不会受到阻塞。在这项工作中,我们提出了一个新颖的框架,用于以一种自我监督的方式学习多任务视觉执行表示。我们设计了一种机制,该机制使机器人能够自主收集空间对齐的视觉和触觉数据,这是下游任务的关键属性。然后,我们使用交叉模式对比损失训练视觉和触觉编码器将这些配对的感觉输入嵌入共享潜在空间中。对学习的表示形式进行评估,而无需对5个感知和控制任务进行微调,涉及可变形表面:触觉分类,接触定位,异常检测(例如,手术幻影肿瘤触诊),触觉搜索,例如,视觉疑问(例如,在遮挡的情况下,都可以从视觉询问中进行触觉搜索),以及沿布边缘和电缆的触觉伺服。博学的表示形式在毛巾功能分类上达到了80%的成功率,手术材料中异常检测的平均成功率为73%,视觉引导触觉搜索的平均成功率和87.8%的平均伺服距离沿电缆和服装的平均伺服距离为87.8%。接缝。这些结果表明,学习的表示形式的灵活性,并朝着对机器人控制的任务不合时宜的视觉表达表示迈出了一步。
translated by 谷歌翻译